693 research outputs found

    How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies

    Full text link
    Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a benchmark, we discuss the role that the discount factor may play in the quality of the learning process of a deep Q-network (DQN). When the discount factor progressively increases up to its final value, we empirically show that it is possible to significantly reduce the number of learning steps. When used in conjunction with a varying learning rate, we empirically show that it outperforms original DQN on several experiments. We relate this phenomenon with the instabilities of neural networks when they are used in an approximate Dynamic Programming setting. We also describe the possibility to fall within a local optimum during the learning process, thus connecting our discussion with the exploration/exploitation dilemma.Comment: NIPS 2015 Deep Reinforcement Learning Worksho

    Min Max Generalization for Two-stage Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes

    Full text link
    We study the minmax optimization problem introduced in [22] for computing policies for batch mode reinforcement learning in a deterministic setting. First, we show that this problem is NP-hard. In the two-stage case, we provide two relaxation schemes. The first relaxation scheme works by dropping some constraints in order to obtain a problem that is solvable in polynomial time. The second relaxation scheme, based on a Lagrangian relaxation where all constraints are dualized, leads to a conic quadratic programming problem. We also theoretically prove and empirically illustrate that both relaxation schemes provide better results than those given in [22]

    Benchmarking for Bayesian Reinforcement Learning

    Full text link
    In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the collected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many BRL algorithms have already been proposed, but even though a few toy examples exist in the literature, there are still no extensive or rigorous benchmarks to compare them. The paper addresses this problem, and provides a new BRL comparison methodology along with the corresponding open source library. In this methodology, a comparison criterion that measures the performance of algorithms on large sets of Markov Decision Processes (MDPs) drawn from some probability distributions is defined. In order to enable the comparison of non-anytime algorithms, our methodology also includes a detailed analysis of the computation time requirement of each algorithm. Our library is released with all source code and documentation: it includes three test problems, each of which has two different prior distributions, and seven state-of-the-art RL algorithms. Finally, our library is illustrated by comparing all the available algorithms and the results are discussed.Comment: 37 page

    On overfitting and asymptotic bias in batch reinforcement learning with partial observability

    Full text link
    This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding L1 error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting in the partially observable context.Comment: Accepted at the Journal of Artificial Intelligence Research (JAIR) - 31 page

    Cybersecurity in Power Grids: Challenges and Opportunities

    Get PDF
    Increasing volatilities within power transmission and distribution force power grid operators to amplify their use of communication infrastructure to monitor and control their grid. The resulting increase in communication creates a larger attack surface for malicious actors. Indeed, cyber attacks on power grids have already succeeded in causing temporary, large-scale blackouts in the recent past. In this paper, we analyze the communication infrastructure of power grids to derive resulting fundamental challenges of power grids with respect to cybersecurity. Based on these challenges, we identify a broad set of resulting attack vectors and attack scenarios that threaten the security of power grids. To address these challenges, we propose to rely on a defense-in-depth strategy, which encompasses measures for (i) device and application security, (ii) network security, and (iii) physical security, as well as (iv) policies, procedures, and awareness. For each of these categories, we distill and discuss a comprehensive set of state-of-the art approaches, as well as identify further opportunities to strengthen cybersecurity in interconnected power grids

    Thin crystalline macroporous silicon solar cells with ion implanted emitter

    Get PDF
    We separate a (34 ± 2) μm-thick macroporous Si layer from an n-type Si wafer by means of electrochemical etching. The porosity is p = (26.2 ± 2.4)%. We use ion implantation to selectively dope the outer surfaces of the macroporous Si layer. No masking of the surface is required. The pores are open during the implantation process. We fabricate a macroporous Si solar cell with an implanted boron emitter at the front side and an implanted phosphorus region at the rear side. The short-circuit current density is 34.8 mA cm-2 and the open-circuit voltage is 562 mV. With a fill factor of 69.1% the cell achieves an energy-conversion efficiency of 13.5%.Federal Ministry for Environment, Nature Conservation, and Nuclear Safety/FKZ 032514

    Multiple Slips in Atomic-Scale Friction: An Indicator for the Lateral Contact Damping

    Get PDF
    The occurrence of multiple jumps in 2D atomic-scale friction measurements is used to quantify the viscous damping accompanying the stick-slip motion of a sharp tip in contact with a NaCl(001) surface. Multiple slips are observed without apparent wear for normal forces between 13 and 91nN. For scans parallel to [100] directions, the tip jumps between minima of the substrate corrugation potential in a zigzag fashion. An algorithm is applied to determine histograms of lateral force jumps which characterize multiple slips. The same algorithm is used to classify multiple slips occurring in calculated lateral force maps. Comparisons between simulations and experiments indicate that the nanometer-sized contact is underdamped at intermediate loads (13-26nN) and becomes slightly overdamped at higher loads. The proposed procedure is a novel way to estimate the lateral contact damping which plays an important role in the interpretation of measurements of the velocity and temperature dependence of friction, of slip duration, and of the reduction of friction by applied perpendicular or parallel oscillation

    A bibliometric analysis of orthogeriatric care: top 50 articles.

    Get PDF
    BACKGROUND Population is ageing and orthogeriatric care is an emerging research topic. PURPOSE This bibliometric review aims to provide an overview, to investigate the status and trends in research in the field of orthogeriatric care of the most influential literature. METHODS From the Core Collection databases in the Thomson Reuters Web of Knowledge, the most influential original articles with reference to orthogeriatric care were identified in December 2020 using a multistep approach. A total of 50 articles were included and analysed in this bibliometric review. RESULTS The 50 most cited articles were published between 1983 and 2017. The number of total citations per article ranged from 34 to 704 citations (mean citations per article: n = 93). Articles were published in 34 different journals between 1983 and 2017. In the majority of publications, geriatricians (62%) accounted for the first authorship, followed by others (20%) and (orthopaedic) surgeons (18%). Articles mostly originated from Europe (76%), followed by Asia-pacific (16%) and Northern America (8%). Key countries (UK, Sweden, and Spain) and key topic (hip fracture) are key drivers in the orthogeriatric research. The majority of articles reported about therapeutic studies (62%). CONCLUSION This bibliometric review acknowledges recent research. Orthogeriatric care is an emerging research topic in which surgeons have a potential to contribute and other topics such as intraoperative procedures, fractures other than hip fractures or elective surgery are related topics with the potential for widening the field to research
    corecore